Eleven years ago, the mathematician Stefaan Vaes, upon receiving the prestigious Francqui Prize for young scientists from the hands of Belgium’s Queen Mathilde, took the opportunity to tell Her Majesty why we mathematicians choose such a misunderstood profession. “Omdat wij dit graag doen,” he said—because we like it. A few years before that, another Belgian, Pierre Deligne, on the occasion of his selection for the Abel Prize—often called the mathematician’s Nobel Prize—explained that he decided on a career in mathematical research when he learned that “one could earn one’s living by playing.”
What both Belgians were saying is that mathematical research constitutes what Henri Poincaré, the great French mathematician, called a “free creative art”—one of the rare remaining examples of unalienated labor. For how much longer?
AI has been targeting mathematics, as both a challenge and a trophy, since its beginnings in the 1950s. Three years after the phrase “artificial intelligence” was coined in the run-up to a legendary 1956 conference at Dartmouth, Alan Newell and Herbert Simon predicted that within a decade, digital computers would reach four milestones—matching human capabilities in mathematics, music, chess, and psychotherapy—on the road to a “world in which [human] intellectual power and speed are outstripped by the intelligence of machines.” The deadlines, periodically revived, came and went; finally, in 1997, IBM’s Deep Blue defeated Garry Kasparov. Progress toward the other milestones largely stalled until the deep learning revolution of the 2010s, when the Newell-Simon program took on new life, amplified by the colossal wealth and ambition of Silicon Valley.
With the rise of generative AI, we are now seeing regular predictions that human mathematical power is at last on the verge of being outstripped, whether or not anyone likes it. From the New York Times publishing an article under the title “AI Is Coming for Mathematics, Too,” to serious journals like Science and Nature repeating industry talking points, the general public is repeatedly being told that mathematics will be the next domino to fall on the inevitable march to artificial general intelligence (AGI). Among mathematicians, versions of the following syllogism have been circulating in recent years with increasing urgency:
- They said an AI could never solve Go
- But AlphaZero solved Go
- Go is a game
- Mathematics is also a game [see “playing”]
- Therefore, AI will “solve” math.
The annual International Mathematical Olympiad, once a contest for talented high school students, is now the occasion for DeepMind and OpenAI to announce the success of their latest models: silver medal in 2024, gold medal in 2025. Students training to be mathematicians—because they like it—are asking whether their chosen life path has a human future, and the mathematical community has generated its own versions of boomers and doomers.
“Computational Intelligence is the manifest destiny of computer science,” Edward Feigenbaum wrote more than twenty years ago: “the goal, the destination, the final frontier.” Mathematical boomers share Feigenbaum’s romantic vision; in this they are encouraged by funding agencies like the National Science Foundation (NSF), whose new Institute for Computer-Aided Reasoning in Mathematics at Carnegie Mellon aims to “empower mathematicians to take advantage of new technologies for mathematical reasoning,” as well as by established industry labs and startups like Axiom, which promises, “We stand at the threshold of a mathematical renaissance.” The collective market capitalization of such ventures would suffice to fund 200 math PhDs—the number lost this year in the United States due to cuts to graduate programs—every year for the next 500 years.
For their part, the doomers take the syllogism literally: they agree with the New Scientist that AI “is rewriting what it means to be a mathematician,” and while they “are still hopeful there will be a place for them in an increasingly machine-led future,” they anticipate “a world in which AI mathematicians take humans ‘out of the loop’ entirely.” Meanwhile, to the simmering frustration of those keyed in to our profession’s purported manifest destiny, the vast majority of mathematicians go about their routine, and like it, oblivious to the looming boom and/or doom.
To those not already convinced that work ought to be a source of pleasure, talk of earning one’s living by playing doesn’t sound very serious, so mathematicians typically look for a more respectable name for what we seek. Given the subject’s reputation for impenetrability, it might be surprising to learn how often we choose the word understanding. “You do mathematics,” the number theorist Barry Mazur writes, “because you want to apply your findings to real-life questions or because you are seeking some pure understanding.” The word “understand” appears in the abstract, often in the first sentence, in nine out of ten NSF grant proposals from my own department this year. The classic formulation of this attitude was given in a famous article by Fields Medalist William Thurston: “It may sound almost circular to say that what mathematicians are accomplishing is to advance human understanding of mathematics.”
The market capitalization of math-related labs and startups would suffice to fund 200 math PhDs—the number lost this year in the United States—every year for the next 500 years.
But what is this understanding? In avoiding circularity we retreat into poetry: “it happens suddenly: one direction becomes more dense, or more luminous,” in the words of mathematician Marie-France Vignéras. Seeking clarity in the Dictionary of Untranslatables (2014), we read that “understanding . . . inherits a long conceptual history that contrasts it, as an act of intuition (Greek νοῦς) with rational discursive acts (Gr. διανοια), and it is defined in contradistinction to reason (Latin ratio).”
The fact that it is hard to be rationally discursive about mathematical understanding was already familiar in antiquity, if we are to trust Hannah Arendt. In The Human Condition (1958), she cites Aristotle’s Nicomachean Ethics to defend her claim that the “chief characteristic” of νοῦς “is that its contents cannot be rendered in speech.” But when you dig through the mutually incompatible translations of that passage in Aristotle, you find that he illustrates this elusive quality of νοῦς by comparing it to the immediate perception of a triangle—the paradigm of a mathematician’s understanding, not accessible to ratio, or what the Greeks called λογος.
Philosophers of mathematics are on the case. In the last two years alone, Yacin Hamami and Rebecca Lea Morris, Jeremy Avigad, Jessica Carter, and Michael Friedman and Kati Kish-Bar-On have all made efforts to pin down mathematical understanding and label its parts. And since they are philosophers, no two of them understand understanding, to say nothing of its mathematical subspecies, in the same way. Does it amount to “rationally reconstructing” the “underlying plan” of a proof, or “possession of abilities that enable one to carry out . . . reasoning successfully,” or a “relation between an agent and a selected part of mathematics” that typically involves “unification, abstraction, visualization, and mechanisms” but cannot be reduced to any one of these aspects? Whether you believe (a) that human and AI mathematicians understand in the same way, or (b) that the latter are intruders from a new epistemic species, or even (c) that understanding is irrelevant to AI mathematics, may hinge on your favorite philosopher’s answer to this question.
Understanding is a lively topic for philosophers, but not for the tech industry. In their race to the ultimate prize of AGI, Silicon Valley’s main players instead see the mechanization of reasoning as the main hurdle. For them, mathematics is the supreme AI challenge because it is the purest form of reasoning. Human mathematicians, from their perspective, are saddled with a beta version of intelligence, reliant on vaporous experiences like understanding that are of no commercial value. A machine, by contrast, can function without anyone worrying whether or not it understands what it’s doing, much less whether the machine likes it.
And yet, there’s no denying that electronic computers have transformed mathematics in the seventy years since the Dartmouth meeting, though not thanks to their “intelligence.” In the early 1960s, number theorists Bryan Birch and Peter Swinnerton-Dyer formulated a conjecture on the number of rational solutions of a cubic equation in two variables, based on computer calculations. They used the Cambridge University EDSAC 2, whose 5 kilobytes of RAM amounts to 0.0000000003125 percent of the GPU VRAM used to train ChatGPT. The conjecture—one of the most influential of all time, and among the six unresolved million-dollar Millennium Prize Problems of the Clay Mathematics Institute—remains computing’s most consequential contribution to mathematics. No mathematician now would think of stating a far-reaching conjecture without first testing it numerically on a computer, if such a test is at all possible.
Computing’s next incursion into mathematics was more fraught. In 1976 Kenneth Appel and Wolfgang Haken announced that a four-year long “man-machine dialogue” had produced a proof of the nearly century-old Four Color Theorem—the statement that the distinct regions on any map, no matter how large, can be decorated with four colors in such a way that no two regions of the same color shared a border—using 1,200 hours of computing time. Sociologist Donald MacKenzie’s book Mechanizing Proof (2001) devoted an entire chapter—with the title “Eden Defiled”—to the controversy stirred by the Appel-Haken announcement. Philosopher Thomas Tymoczko thought that accepting their result meant “changing the sense of the underlying concept of ‘proof.’” No one seriously doubted that the arguments underlying the computer calculations were correct; the objection instead hinged on the separation of proof from our old friend understanding, as in this reaction by British mathematician Frank Bonsall: “We cannot possibly achieve what I regard as the essential element of a proof—our own personal understanding—if part of the argument is hidden away in a box.” Yuri Ivanovich Manin, most philosophical among the creators of the brilliant postwar Moscow mathematical school, explicated mathematical understanding with an epigram: “A good proof,” he explained, “is a proof that makes us wiser.”
Discomfort over the status of proofs dependent on inscrutable computer calculations resurfaced in 1998, when Tom Hales completed his proof of Kepler’s conjecture—that the most efficient way to pack three-dimensional space with equally sized spheres is the one grocers use when stacking oranges—with the help of his graduate student Samuel Ferguson and a computer program that ran to over 100,000 lines of code. The Annals of Mathematics, perhaps the most respected of all mathematics journals, eventually published the proof after cutting short the referee process—the referees “became exhausted and quit” after years of effort to verify the proof.
To quiet any lingering doubts as to the proof’s correctness, Hales launched the international Flyspeck project in 2003 to translate his proof from the vernacular of normal mathematical discourse into a step-by-step recipe in a language that resembles computer code and can be read by a machine, a language designed to produce mathematical statements unambiguously and infallibly, like identical tubes of toothpaste. Eleven years later, and nine years ahead of schedule, Hales and his team announced the completion of this formal proof. “This technology cuts the mathematical referees out of the verification process,” Hales told New Scientist. “Their opinion about the correctness of the proof no longer matters.” No claims were made as to whether the formalization process made anyone, or anything, wiser.
The idea of formalization goes back to Leibniz’s dream of a “universal characteristic,” a language that eliminates ambiguity and allows the correct answer to any question—moral, political, or scientific—to be determined by calculation. At the turn of the twentieth century, this program was advanced by a new formal symbolism developed in the hopes of banishing paradoxes and placing mathematics on irrefutable logical foundations. In formal mathematics, every sentence is a sequence of unambiguous logical symbols, and a proof is a sequence of statements, starting from the accepted axioms and proceeding by application of the accepted rules of deduction, concluding with the statement of the theorem being proved. The formalism of David Hilbert, who along with Poincaré dominated the mathematics of the time, anticipated that every mathematical claim could be rewritten in a formal language in such a way that error would be impossible, as in Leibniz’s system.
Here’s how formalism works with the syllogism about how AlphaZero portends the death of human mathematics. The first line gets formalized as
- [every] t [in] T, σ(t) [not][exist] α [in] AI, S(G)
This says precisely that “every t in the set T (‘they’) said (there does) not exist an in the set AI such that does S to G.” Continuing in this way,
2. [in] AI and S(G)
3. G [in]
4. M [in]
5. [Therefore] [exists] [in] AI and S(M).
Here we can translate as AlphaZero, S as “solves,” G as “Go,” as the set of “games,” and M as “math,” and we recover the previous syllogism: some AI named does to math what AlphaZero did to Go. But for the formalist, M could also be “monkey” or “Manitoba” and S(G) could be “steals a grocery” or “shits a golf ball”; the conclusion in line 5 remains valid, or invalid, depending only on the syntax. And in this case the formalism does its work! We can see that the argument is invalid on purely syntactic grounds. Setting aside the dubious claim in line four (that math is a game), the formalism lays bare the flawed logic of the supposed syllogism: none of the lines asserts that whatever is true of one thing in is true of every other thing in .
The appeal of Hilbert’s program rested on two assumptions: that mathematical truth could be captured by a formal system, and that a fully formalized proof could be verified, inspected mechanically for validity. Kurt Gödel’s incompleteness theorems of 1931 dealt a death blow to the strongest version of Hilbert’s program by showing that the first assumption was hopeless. But the second is true, as our identification of the syllogism’s flaw demonstrates: extrapolation is not a valid logical move in our formal system.
Formalization remained a niche interest in the discipline for a while after Gödel, but it came to the surface in the December 2008 issue of the Notices of the American Mathematical Society. With his Flyspeck project halfway to completion, Hales wrote the first of four articles on the subject. In the spirit “that regards the foundational enterprise as unfinished until it is realized in practice and written down in full,” he explained how human proofs are translated into the formal systems that had been invented over the past few decades and reported on some of the most recent successes in formalization—notably Georges Gonthier’s formalized proof of the Four Color Theorem, which Gonthier himself described in a second article.
At their most rhapsodic, boomers would have you believe AI companies aim to democratize the creation of knowledge. Back on this planet, these are textbook cases of alienated labor.
It was left to John Harrison, principal engineer at Intel, to explain why mathematicians should embrace formalization, which he called “a natural further step” in the constant evolution of math “towards greater clarity and precision.” “If the underlying mathematics” of bridges and aircraft “is in doubt,” asked Harrison, “then how can we trust these engineering artifacts?” By analogy, Harrison cited the importance of formal verification of computer systems. This is unsurprising; MacKenzie’s book makes clear that the precursors of formal languages used by mathematicians today—Coq/Rocq, Mizar, Isabelle, HOL, and (the current favorite) Lean—were invented for software and hardware verification. But Harrison’s first “specific goal,” to insert “an objective and mechanizable criterion for the correctness of proofs” into mathematical peer review, was more of a stretch; MacKenzie notes that similar projects had “sparked only limited interest within the mainstream mathematical community.”
Attitudes toward machine proof-checking began to change about ten years ago, just before the explosive growth of AI as a consumer interface, a primary motor of economic growth, and the centerpiece of a new geopolitical great game. The history of this change has yet to be written, but undoubtedly it came about partly because formalization had matured into a self-sustaining branch of mathematics in its own right, partly because of the enthusiasm around the completion of the Flyspeck project, and more generally because of the impressive performance of transformer-based deep learning models and the increasing integration of AI into computer systems. By the time industry started predicting the imminent arrival of AGI—in 2019 Christian Szegedy, then at Google, predicted a superhuman mathematician would emerge by 2029—the formalization agenda had taken the concrete human form of the Lean community, and a growing number of academic mathematicians, in collaboration with industry labs or in their own universities, had begun reporting on experiments with AI mathematics. As of this June, the Lean community counts 437 members, while a website curated by computational number theorist Seewoo Lee lists 170 “awesome papers exploring the use of artificial intelligence / machine learning / deep learning for mathematical discoveries.”
Proof checking and proof generation are not the same thing, but the two are deeply intertwined in the vision of artificial superhuman mathematics. Formalizers break down mathematical arguments into their syntax, as in the syllogism above. Emily Riehl has argued that translating mathematics from natural language into a version readable by a computer proof assistant reveals “how much ‘invisible mathematics’ is obscured in a proof on paper” and has integrated formalization into her undergraduate teaching at Johns Hopkins. Others point to the syntactic analysis of proofs required for formalization as an aid to understanding.
But the principal benefit claimed for formalization is the guarantee it provides for correctness. Though errors have fewer practical consequences in mathematical reasoning than in the computer systems for which formalization was invented, as MacKenzie and Harrison remind us, artificial mathematicians built on generative AI will hardly qualify as superhuman unless their arguments are valid. This is where proof checkers come in: by rejecting hallucinations, they vouch for the logical validity of any deductions they manage to compile. On their own, however, they do not make us wiser.
When Silicon Valley’s prophets of superhuman mathematics do violence to the values of human mathematics, in particular to the insatiable desire for understanding, they are forgetting that the technology to which they owe their fearful power derives from the history of that very desire. Consider the following fable, largely true, about the intertwined histories of calculus, number theory, and the most implacable symbolic logic.
Zeno of Elea, toward the beginning of the Western philosophical tradition, wondered how an arrow could ever fly at all, given that to reach its destination it first had to go halfway, then half of the remaining distance, then half again, and so on, passing through infinitely many intermediate stages before impact. Yet even in Zeno’s days arrows routinely did hit their targets. The modern solution to Zeno’s paradox is expressed in the formula
where that ellipsis is just another way of saying, if you continue adding half the previous number forever, the sum comes out to 1.
The usual rules of arithmetic do not allow for infinitely many additions. Nor do the laws of physics. Even if mathematical superintelligence, for whose invention the startup Harmonic has raised over a quarter of a billion dollars, took no more than a trillionth of a nanosecond to carry out an addition, it would still take infinitely long before the arrow ended its flight. Though Archimedes anticipated the concept of infinite sums, consistent rules for manipulating them to obtain finite results were only developed with the invention of calculus in the late seventeenth century.
In this way mathematics offered a concept, in the form of an understanding—how to sum an infinite series—as a belated gift to philosophy. But the new concept raised its own questions. Some infinite sums give a finite answer, like Zeno’s, while others—for example
—do not. Standard techniques can tell which kind of infinite series we’re dealing with, but they work only in very specific, rather contrived situations. Naturally enough, these are the situations that are taught in a first-year calculus course. Thus students learn that the sum
is infinite (this was already known in the fourteenth century), while the sums
and so on (you can guess the rule) are finite.
But knowing that a sum is finite doesn’t tell you what it equals. The first and last series on this list have long been known to sum to simple expressions involving ; the one in the middle has a more complicated story. Sorting this all out so that mathematicians could talk to each other about infinity and be sure of being understood took until the end of the nineteenth century.
At that point the theory of infinite sums branched off in several directions. Focusing on the ways an infinite sum can fail to give a finite answer inspired Georg Cantor to develop his theory of sets, which, in turn, as filtered through the paradoxes of mathematical logic, offered an extraordinary gift to technology—the theory of computation—and thus, skipping ahead a century or so, to the greatest concentration of wealth and power ever seen in a single industry, and thus to the promise to make human mathematicians obsolete.
Physicists, meanwhile, continue to build successful theories on the manipulation of infinite sums that make no mathematical sense. The mathematical mainstream, for its part, has adapted the new rules for handling infinite sums to its own purposes, namely the pursuit of understanding. Nineteenth-century mathematics was still trying to articulate the rules when the German mathematician Bernhard Riemann put them to a novel use. Riemann assembled all the infinite sums that share the general shape of the last three displayed above into a structure now called the Riemann zeta function, written . What he discovered, by manipulating these sums according to the rules then under development, was that this function is intimately related to the way prime numbers are scattered among all the positive numbers.
Primes, you’ll recall, are the numbers, like , , , , and , that cannot be written as products of smaller numbers, unlike or . The list of primes never ends—Euclid already proved that—but there’s no rule that tells you how to generate the list. The Riemann Hypothesis (RH) asserts a property of which, if true, would imply that the distribution of prime numbers is, in a certain sense, as random as possible.
This conjecture is the most notorious of the unsolved Millennium Prize Problems, and when we mathematicians talk or write or obsess over the prospect of being made obsolete at AI’s invisible hands, the point of no return often takes the form of an AI proof of the Riemann Hypothesis. The AI companies couch their math fantasies in the same terms. As they tell it, their resolution of RH would be a sign that some epistemic pendulum has swung definitively in their favor. Many mathematicians narrate this same story as a nightmare: a million-line proof that no less definitively marks the defeat of human understanding.
The tokens of mathematical understanding can take many forms—proofs, diagrams, definitions—but their common feature is that they have to be able to be freely available, ready to be shared.
But why does it matter to mathematicians whether the primes are distributed randomly, so much so that they have been looking for a proof for 167 years? Not for the reason filmmaker Werner Herzog suggested some years ago: that it’s central to the human quest to grasp “The Absolute,” and that “mathematics will probably come closest to getting it when someone finally proves Riemann’s hypothesis.” The real answer is: if what RH asserts were not the case—if there were some pattern to the primes—mathematicians would want to know why. Fields Medalist Timothy Gowers calls this the “no-coincidence principle”: if an apparently outrageous coincidence happens in mathematics, there is a reason for it. Conversely, Gowers could have added, if there is no outrageous coincidence, there is a reason for that as well.
Following Deligne, we might refine our sense of who mathematicians are by saying that they are people for whom figuring out such reasons feels like playing. Or we might say that reasons of this kind are the tokens of mathematical understanding. These tokens can take many forms—proofs of theorems, diagrams, definitions of new unifying concepts—but their common feature is that they have to be able to be freely available, ready to be shared; otherwise they don’t count as mathematics. Mathematics only makes sense as a gift economy. It has no trade secrets; its understanding can’t be monetized.
But why, then, does it matter so much to Silicon Valley whether the primes are distributed randomly? Why does Surge AI—which counts Harvard, Oxford, NASA, and the Olympics, Goldman Sachs and Navy SEALs among its clients—promise on its home page to hire “the world’s greatest minds” to “train AI to explore the Riemann Hypothesis and beyond”? Since prime numbers are the basis of the encryption systems used to guarantee security on the internet, some claim a proof of RH may have practical implications. I find this totally unconvincing. Computer calculations have confirmed RH with far more accuracy than any of the physical theories that underlie contemporary technology, without going to the bother of paving northern Virginia or the surface of Mars with data centers and building nuclear reactors to power them. Such confirmation is as good as a mathematical proof for all commercial purposes.
At the annual national mathematics conference this January, Geordie Williamson, an Australian mathematician whose experiments with AI were among the first academy-industry collaborations to be widely publicized, warned a packed auditorium that “industry AI labs (and government) are not primarily motivated by intellectual curiosity.” Mathematics motivates them for the reason already noted: “solving” mathematics, as they put it, is an obligatory step on the way to their real goal. They call the goal AGI, or ASI (artificial superintelligence), but whatever name they choose, its intrinsic worth is measured in money. As Paul Ford wrote in the New York Times this February, “much of the AI industry is led by people who see human thought as raw material, like a steel manufacturer sees ore.” Sam Altman, OpenAI’s CEO, was clear enough about the goal at a BlackRock summit in March: “a future where intelligence is a utility, like electricity or water, and people buy it from us on a meter.”
They are also interested in the Riemann Hypothesis as a way to get your attention—your attention, dear reader!—and the attention of the likes of Herzog, because, although the million dollars of the Millennium Prize may be less than what Google spent on AI every ten minutes in 2025, it is still the kind of trophy, like the 2024 Nobel Prize awarded to Google DeepMind’s CEO for AlphaFold, that can consecrate its winner with cultural capital—which, by a miraculous transmutation, translates into colossal investments of actual capital.
In late January, a message from something called Handshake AI appeared in inboxes in my department:
Dear Professor,
I hope this email finds you well. I’m reaching out from Handshake AI with an invitation to a selective, paid research project involving expert evaluation of AI systems. We’re assembling a small group of senior researchers with expertise in Analysis, Number Theory, Topology, Algebra/Geometry to help train an AI model to read, interpret, and summarize peer-reviewed scientific literature.
Project details:
• Compensation: $185–$400/hour, based on background and experience
• Time commitment: ~10 hours/week, flexible
• Start: Rolling, as soon as next Monday, Feb 2
• Duration: Through the end of February, with likely extension
It’s not one of those nine-figure signing bonuses we read about, and the decades I’ve spent acquiring expertise in number theory hardly amount to the right kind of background and experience, but an extra $1,850 to $4,000 per week sounds pretty good, right?
At their most rhapsodic, AI boomers would have you believe that such projects aim to democratize the creation and production of knowledge, to bring into being a kind of epistemic communism that, to update a famous passage from Marx’s German Ideology, makes it possible for me to do one thing today and another tomorrow, to hunt in the morning, fish in the afternoon, rear cattle in the evening, mathematize after dinner . . . without ever becoming hunter, fisherman, herdsman, or mathematician.
Back on this planet, the Handshake AI offer is a textbook case of alienated labor. The results of the work belong unequivocally to the firm that commissioned the project, whose identity presumably is never shared with the “experts.” In other words, Handshake’s email was an invitation to claim a spot on a spectrum of “ghost work” or “microwork,” not unlike tagging images or filtering videos, as one of the “human robots” who are “almost indistinguishable from software units,” in the words of sociologist Antonio Casilli. Human robots with few credentials, like Amazon’s Mechanical Turk, have been paid less than $2 an hour; “senior researchers with expertise,” Handshake tells us, can earn a hundred to two hundred times as much. The relation to the finished product is the same in both cases: “the degradation of men into commodities,” to quote Arendt; or as Jeff Bezos put it, “this is basically people-as-a-service.”
This essay will quickly fall out of date; much of what I wrote in its first draft is already out of date. The Angel of History looks on, puzzled, as the press releases pile up. An industry that sees thinking machines as debugged versions of human beings makes increasingly bold claims for their models’ performance on a series of increasingly ambitious benchmarks. This January saw a “more or less autonomous” solution of one of more than a thousand problems proposed last century by Paul Erdős. In February both OpenAI and Google DeepMind claimed success, controversially, in solving most of ten problems designed by an international group of prominent mathematicians, in a project called First Proof. March brought the news that Aletheia, another DeepMind project, had proved an interesting theorem in algebraic geometry. Yet another Erdős problem fell in late May, arguably of less intrinsic interest than Aletheia’s theorem but treated by the international media as a world-historical breakthrough thanks to the Erdős connection—which here provides something like brand identification—and OpenAI’s massive public relations operation, even more shrill than usual ahead of its planned IPO.
Every month after this essay is published will bring its own lot of similar announcements. Mathematicians outside the industry labs are greeting the news with a combination of excitement, foreboding, and frustration. Press headlines overwhelmingly focus on the foreboding (“mathematicians struggle to foretell their own future”) and the excitement (“mathematics is undergoing the biggest change in its history”). Count me among the frustrated. First Proof’s authors were motivated by the industry’s lack of attention to “research-level math questions” and have criticized submissions from OpenAI and DeepMind for their lack of transparency, especially the latter’s “undisclosed human guidance” that violated First Proof’s rules. Even Aletheia’s March proof, presented as completely autonomous, was the result of a training process whose details are veiled by commercial secrecy and nondisclosure agreements. All we know is that the article introducing Aletheia was signed by twenty-eight coauthors, only a quarter of them mathematicians. How much of its claimed autonomy derives from the scaffolding produced by these “people-as-a-service”?
A recent paper by a team of economists led by Nobel laureate Daron Acemoglu warns that excessive reliance on agentic AI to solve context-specific problems—the kind common to all the benchmarks hitherto proposed as a test of artificial mathematical reasoning—can lead to “knowledge collapse.” The economists’ model displays a “long-run equilibrium” in which “all human knowledge is ultimately destroyed.” A more immediately realistic risk for mathematics is the collapse of its publishing model in the face of a flood of superficially plausible papers generated by LLMs. Peer review simply cannot cope with checking the correctness of all these papers, and a proposed solution—to require every submitted paper to be accompanied by a machine-checkable version in a formal language—raises a philosophical conundrum: How do we know the formal version expresses the same understanding as the human-readable version? That is to say nothing of the absurd prospect of turning mathematics into a dialogue between LLMs.
First Proof’s first round reveals a subtler risk to human mathematical understanding. Science fiction has long asked: If and when aliens visit, how will we communicate? Carl Sagan’s novel Contact (1985), and the movie based on the book, suggests that mathematics, in the form of prime numbers, provides a universal “cosmic ‘hello.’” What most surprised the First Proof team was that even communication with the alien data processing devices that the industry labs set loose on their problems is not so straightforward. “Whereas human mathematicians are part of a community,” First Proof author Andrew Blumberg told me,
and are sensitive to reputational effects, which mean that there is a predisposition to believe that a proof produced by a working mathematician was done ‘in good faith,’ the machines have no such constraints and so can produce an endless succession of wrong proofs.
Whatever the future of benchmarking, the First Proof experiment contains a valuable lesson: understanding is not simply a matter of translating from one language (English, for example) into another (such as 0s and 1s) and back again. Like any other species of understanding, the mathematical variety can only be established on the basis of a shared culture. Substituting a monetizable ersatz culture for the one evolved through historical experience is a particularly insidious form of knowledge collapse.
In spite of the growing AI backlash, mathematicians for the most part have been absent from the public conversation about AI. That is finally beginning to change. A group of mathematicians who met last September in the Netherlands for a workshop on “Mechanization and Mathematical Research”—I was among them—have collaborated on the first code of ethics for AI and mathematics. In addition to addressing issues specific to mathematical work, our Leiden Declaration, published last week, takes up the kinds of questions that fuel debate in the broader society: respect for the rights of authors, environmental cost, social cost, military and surveillance applications.
The collective that came together to draft the text includes mathematicians, computer scientists, philosophers, and social scientists, and their relations to the technology range from total avoidance to collaboration with industry labs to government consulting. Although the paragraphs on what mathematicians expect from the industry were the result of compromise, a clause echoing the forty-year-old Uppsala Code of Ethics for Scientists—that mathematicians should not participate in “the development of technology for use in warfare, oppression, mass surveillance, or the undermining of democracy”—was the object of complete consensus.
Press coverage of the excitement generated by automated mathematics overwhelmingly focuses on comments by mathematical superstars on the latest announcements by industry labs. Given that the war against Iran has blurred the line between AI firms and defense contractors, this tendency places mathematicians who may be excited about the potential use of the technology but who subscribe to the principles of the Leiden Declaration in an uncomfortable position. Beyond the glare of publicity, however, small-scale projects are increasingly showing that it is possible to use AI and related technology to enhance human mathematical understanding while bypassing the commercial product. Lee’s list of “awesome papers” created with the help of AI includes one supported by Swiss public funds—Fields Medalist Maryna Viazovska is among its authors—and another that used a hundred-line machine learning algorithm produced on a laptop.
“Mathematicians have a choice about whether and how to adopt artificial intelligence in the conduct of their research,” says the Leiden Declaration. Small-scale projects like those just mentioned contribute to the “continued flourishing of the discipline” on which the Declaration insists when it warns that tech industry involvement “put the autonomy of mathematics under threat.” For “autonomy” read “unalienated labor”; for “flourishing” read “understanding.” Some media coverage of the Declaration has seen it as an expression of anxiety; Science called it “a public cry for help.” I interpret it quite differently: as an expression of optimism, as the belief that mathematicians still “have a choice” to maintain our gift economy, and that there will still be a place for human beings to devote their lives and their energy to mathematics, or to anything else, because they like it.
Independent and nonprofit, Boston Review relies on reader funding. To support work like this, please donate here.